Incorporating durational modification in voice transformation

نویسندگان

  • Arthur R. Toth
  • Alan W. Black
چکیده

Voice transformation is the process of using a small amount of speech data from a target speaker to build a transformation model that can be used to generate arbitrary speech that sounds like the target speaker. One common current technique is building Gausian Mixture Models to map spectral aspects from source to target speakers. This paper proposes the use of duration models to improve the transformation models and output speech quality. Testing across seven target speakers shows a statistically significant improvement in a popular objective metric when duration modification is performed both during training and testing of a Gaussian Mixture Model mapping based voice transformation system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Impact of durational outlier removal from unit selection catalogs

Outlier removal is a straightforward technique for improving the quality of unit selection catalogs without hand correction. This paper investigates the use of phone durations as a criteria for removing bad units. Scoring conditioned on linguistic context demonstrably better than statistics based on phone class alone. The impact of voice modification is evaluated with a 444K utterance test corpus.

متن کامل

Prosodic Cues for Hesitation

In our efforts to model spontaneous speech for use in, for example, spoken dialogue systems, a series of experiments have been conducted in order to investigate correlates to perceived hesitation. Previous work has shown that it is the total duration increase that is the valid cue rather than the contribution by either of the two factors pause duration and final lengthening. In the present expe...

متن کامل

F0 transformation within the voice conversion framework

In this paper, several experiments on F0 transformation within the voice conversion framework are presented. The conversion system is based on a probabilistic transformation of line spectral frequencies and residual prediction. Three probabilistic methods of instantaneous F0 transformation are described and compared. Moreover, a new modification of inter-speaker residual prediction is proposed ...

متن کامل

Voice Impersonation using Generative Adversarial Networks

Voice impersonation is not the same as voice transformation, although the latter is an essential element of it. In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker. In this paper, we propose a novel...

متن کامل

Durational Correlates of Word-initial Voiceless Geminate Stops: The Case of Kelantan Malay

This paper investigates the production of wordinitial geminate consonants in Kelantan Malay with a focus on voiceless stops. It presents an acoustic phonetic analysis examining two acoustic parameters: closure duration and voice onset time (VOT). Evidence from a production experiment indicates that there is a clear durational contrast between word-initial voiceless geminate stops and their sing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008